TensorFlow CNN model¶Import the required libraries, this step might take some time
%%time
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
import tensorflow.keras.layers as layers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import helpful_functions as HF
%matplotlib inline
Wall time: 5.79 s
## check the tensorflow version that we have
tf.__version__
'2.6.0'
## check if a GPU is available for training
num_gpu = len(tf.config.list_physical_devices('GPU'))
print(f'The number of available GPUs for training = {num_gpu}')
if num_gpu:
print('Training on GPU!')
else:
print('Training on CPU!')
The number of available GPUs for training = 1 Training on GPU!
tensorflow_datasets module¶Since we need some data to train a predictive model on, we can leverage the tensorflow_datasets module and load some of its preprocessed datasets with a few lines of code; specifically, we will be loading the Fashion MNIST dataset from the tensorflow_datasets module.
Here is a breakdown and an explanation for the parameters that we have to set while using the tensorflow_datasets API:
name = the name of the dataset that we would like to load among the datasets available by the tensorflow_datasets modulesplit = splitting the data into training and test records, we can also control the % of the records in each of our splits if we would likedata_dir = the directory in which the data should be downloaded, if the data is available in the directory then it won't be downloaded againas_supervised = a Boolean of whether to include the labels with the features or not, if it's set to True then photos will be represented by Tuples of length = 2, where the first element is the features and the second element is the labelsbatch_size = an Integer that represents the size of each batch of the data, set it to -1 to return all the records in the dataset and then we will manage the batch size during the training process. Setting it to -1 might show a warning message but don't worry about it for now.with_info = a Boolean of whether to include the metadata of this dataset or not, if set to False then only the dataset is returned, if set to True then a Tuple is returned with the structure (Dataset, Metadata)## loading the dataset and its metadata
dataset, metadata = tfds.load(
'fashion_mnist',
split = ['train', 'test'],
data_dir='./data',
as_supervised=True,
batch_size=-1,
with_info=True)
## converting the dataset into numpy arrays and tuple unpacking
## the dataset into training data and testing data
(train_images, train_labels), (test_images, test_labels) = tfds.as_numpy(dataset)
WARNING:tensorflow:From C:\Users\MinaNagib\anaconda3\envs\dl\lib\site-packages\tensorflow_datasets\core\dataset_builder.py:643: get_single_element (from tensorflow.python.data.experimental.ops.get_single_element) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.get_single_element()`.
WARNING:tensorflow:From C:\Users\MinaNagib\anaconda3\envs\dl\lib\site-packages\tensorflow_datasets\core\dataset_builder.py:643: get_single_element (from tensorflow.python.data.experimental.ops.get_single_element) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.get_single_element()`.
class_names = metadata.features['label'].names
print("Class names: {}".format(class_names))
Class names: ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
The 28 x 28 pixels images have their pixel values ranging from 0 to 255, we will need to normalize these values to be ranging from 0 to 1, this will help the training to be faster and more stable
## normalizing the data
train_images = train_images / 255
test_images = test_images / 255
## plotting the first 2 images
for image, label in list(zip(train_images[:2], train_labels[:2])):
## converting the image from a 3d array of 28 x 28 x 1 shape
## to a 2d array of 28 x 28 shape
image = np.squeeze(image)
## Plotting the images
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.xlabel(class_names[label])
plt.show()
Split the training data into train and validation datasets
from sklearn.model_selection import train_test_split
train_images, valid_images, train_labels, valid_labels = train_test_split(
train_images, train_labels, test_size=0.2, random_state=42, stratify=train_labels)
Time to build our a CNN TensorFlow model, I will leverage the high level Keras API as it makes life much easier.
tf.random.set_seed(42)
np.random.seed(42)
model = keras.Sequential([
layers.Conv2D(filters=32,
kernel_size=3,
strides=1,
padding='same',
activation='relu',
input_shape=(28, 28, 1)),
layers.MaxPool2D(pool_size=2, strides=2),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPool2D(2, 2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(10, activation='softmax')])
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 28, 28, 32) 320 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 14, 14, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 14, 14, 64) 18496 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64) 0 _________________________________________________________________ flatten (Flatten) (None, 3136) 0 _________________________________________________________________ dense (Dense) (None, 128) 401536 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 421,642 Trainable params: 421,642 Non-trainable params: 0 _________________________________________________________________
As we can see, the number of the Trainable parameters in the CNN model are much more than the Vanilla NN. However, the model will require much less epochs to train compared to the Vanilla NN.
Once the model's architecture is defined, we need to comile the model. During this step we will need to define a:
Optimizer: a method for updating the model's parameters "weights" after each training iterationLoss Function: also known as Cost Function which measures how far our model's predictions are from the real value. The Optimizer will try to minimize the Loss Function in our case here, this is the training processMetrics: any metrics we would like the model to show during training, this help us evaluate the model's performance as they are more meaningful to us## note: we haven't specified a learning rate here
## so the model will be trained with the default learning rate
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
After defining the model's architecture and combiling the model, it's time to train the model. Training is the process in which the model learns the pattern mapping the inputs to the outputs. To train the model, we use the .fit() method. The parameters defined below are:
x: the input data which represents the features, in our case here these are the imagesy: the corresponding labels for the inputs, these are the classes that each image belong tovalidation_data: a tuple like object, in high level this is a subset of the data that the model doesn't use to train and it isn't used in adjusting the model's Parameters. This subset is used to evaluate the model during training and manually tune the model's Hyperparametersepochs: the number of training iterationsshuffle: a boolean of whether to shuffle the data during training or not before each training iterationbatch_size: the number of training examples that the model is fed before updating its parameters%%time
history = model.fit(x=train_images,
y=train_labels,
validation_data=(valid_images, valid_labels),
epochs=10,
shuffle=True,
batch_size=64)
Epoch 1/10 750/750 [==============================] - 10s 9ms/step - loss: 0.4599 - accuracy: 0.8344 - val_loss: 0.3233 - val_accuracy: 0.8843 Epoch 2/10 750/750 [==============================] - 7s 9ms/step - loss: 0.2996 - accuracy: 0.8922 - val_loss: 0.2876 - val_accuracy: 0.8982 Epoch 3/10 750/750 [==============================] - 7s 9ms/step - loss: 0.2522 - accuracy: 0.9085 - val_loss: 0.2412 - val_accuracy: 0.9127 Epoch 4/10 750/750 [==============================] - 7s 9ms/step - loss: 0.2193 - accuracy: 0.9197 - val_loss: 0.2518 - val_accuracy: 0.9079 Epoch 5/10 750/750 [==============================] - 7s 9ms/step - loss: 0.1947 - accuracy: 0.9289 - val_loss: 0.2246 - val_accuracy: 0.9173 Epoch 6/10 750/750 [==============================] - 7s 9ms/step - loss: 0.1697 - accuracy: 0.9374 - val_loss: 0.2086 - val_accuracy: 0.9250 Epoch 7/10 750/750 [==============================] - 7s 9ms/step - loss: 0.1508 - accuracy: 0.9438 - val_loss: 0.2253 - val_accuracy: 0.9219 Epoch 8/10 750/750 [==============================] - 7s 9ms/step - loss: 0.1322 - accuracy: 0.9513 - val_loss: 0.2237 - val_accuracy: 0.9247 Epoch 9/10 750/750 [==============================] - 7s 9ms/step - loss: 0.1137 - accuracy: 0.9587 - val_loss: 0.2259 - val_accuracy: 0.9248 Epoch 10/10 750/750 [==============================] - 7s 9ms/step - loss: 0.0966 - accuracy: 0.9648 - val_loss: 0.2339 - val_accuracy: 0.9260 Wall time: 1min 12s
Once the model completes the training process, it's time to evaluate the model and its predictions. The Keras model object has a history attribute which can be used to load information about the model performance during training
history.history.keys()
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
HF.plot_double_graph(history.history['loss'],
history.history['val_loss'],
history.history['accuracy'],
history.history['val_accuracy'])
test_loss, test_accuracy = model.evaluate(x=test_images, y=test_labels, batch_size=100)
print(f'Loss for the test dataset = {test_loss:.4f}')
print(f'Accuracy over the test dataset = {test_accuracy:.4f}')
100/100 [==============================] - 1s 6ms/step - loss: 0.2611 - accuracy: 0.9181 Loss for the test dataset = 0.2611 Accuracy over the test dataset = 0.9181
We can see the model has achieved around 92% accuracy after training for just 10 epochs compared to the 89% accuracy achieved by the Vanilla NN after training for 50 epochs.
HF.get_report(model, test_images, test_labels, class_names)
| Class_Name | Percent_Correct | Num_Correct | Num_Total | |
|---|---|---|---|---|
| True_Class | ||||
| 0 | T-shirt/top | 0.829 | 829 | 1000 |
| 1 | Trouser | 0.983 | 983 | 1000 |
| 2 | Pullover | 0.873 | 873 | 1000 |
| 3 | Dress | 0.935 | 935 | 1000 |
| 4 | Coat | 0.915 | 915 | 1000 |
| 5 | Sandal | 0.976 | 976 | 1000 |
| 6 | Shirt | 0.735 | 735 | 1000 |
| 7 | Sneaker | 0.981 | 981 | 1000 |
| 8 | Bag | 0.989 | 989 | 1000 |
| 9 | Ankle boot | 0.965 | 965 | 1000 |
We can see the model is still not performing well with some of the shirt examples, so let's visualize some of them.
shirt_idx = pd.Series(test_labels == 6)[test_labels == 6].index[:20]
HF. plot_images(images = test_images[shirt_idx],
class_names = class_names,
true_labels = test_labels[shirt_idx],
model_probs = model.predict(test_images[shirt_idx]))